Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
نویسندگان
چکیده
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown core with features of state and action. Despite much recent progress in analyzing algorithms the linear MDP setting, understanding more general models very restrictive. In this paper, we propose a provably efficient RL algorithm given multinomial logistic model. show that our proposed based on upper confidence bounds achieves O(d√(H^3 T)) regret bound where d dimension core, H horizon, T total number steps. To best knowledge, first function approximation provable guarantees. also comprehensively evaluate numerically it consistently outperforms existing methods, hence achieving both efficiency practical superior performance.
منابع مشابه
Residual Algorithms: Reinforcement Learning with Function Approximation
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning syst...
متن کاملReinforcement Learning and Function Approximation
Relational reinforcement learning combines traditional reinforcement learning with a strong emphasis on a relational (rather than attribute-value) representation. Earlier work used relational reinforcement learning on a learning version of the classic Blocks World planning problem (a version where the learner does not know what the result of taking an action will be). “Structural” learning resu...
متن کاملFuzzy Kanerva-based function approximation for reinforcement learning
Radial Basis Functions and Kanerva Coding can give poor performance when applied to large-scale multi-agent systems. In this paper, we attempt to solve a collection of predator-prey pursuit instances and argue that the poor performance is caused by frequent prototype collisions. We show that dynamic prototype allocation and adaptation can give better results by reducing these collisions. We the...
متن کاملFunction Approximation in Hierarchical Relational Reinforcement Learning
Recently there have been a number of dif ferent approaches developed for hierarchi cal reinforcement learning in propositional setting We propose a hierarchical version of relational reinforcement learning HRRL We describe a value function approximation method inspired by logic programming which is suitable for HRRL
متن کاملDecision Tree Function Approximation in Reinforcement Learning
We present a decision tree based approach to function approximation in reinforcement learning. We compare our approach with table lookup and a neural network function approximator on three problems: the well known mountain car and pole balance problems as well as a simulated automobile race car. We find that the decision tree can provide better learning performance than the neural network funct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i7.25964